HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins
نویسندگان
چکیده
Protein subcellular localization prediction, as an essential step to elucidate the functions in vivo of proteins and identify drugs targets, has been extensively studied in previous decades. Instead of only determining subcellular localization of single-label proteins, recent studies have focused on predicting both single- and multi-location proteins. Computational methods based on Gene Ontology (GO) have been demonstrated to be superior to methods based on other features. However, existing GO-based methods focus on the occurrences of GO terms and disregard their relationships. This paper proposes a multi-label subcellular-localization predictor, namely HybridGO-Loc, that leverages not only the GO term occurrences but also the inter-term relationships. This is achieved by hybridizing the GO frequencies of occurrences and the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved by searching against the gene ontology database, using the accession numbers of homologous proteins obtained via BLAST search as the keys. The frequency of GO occurrences and semantic similarity (SS) between GO terms are used to formulate frequency vectors and semantic similarity vectors, respectively, which are subsequently hybridized to construct fusion vectors. An adaptive-decision based multi-label support vector machine (SVM) classifier is proposed to classify the fusion vectors. Experimental results based on recent benchmark datasets and a new dataset containing novel proteins show that the proposed hybrid-feature predictor significantly outperforms predictors based on individual GO features as well as other state-of-the-art predictors. For readers' convenience, the HybridGO-Loc server, which is for predicting virus or plant proteins, is available online at http://bioinfo.eie.polyu.edu.hk/HybridGoServer/.
منابع مشابه
Prediction of Protein Subcellular Multi-localization by Using a Min-Max Modular Support Vector Machine
Prediction of protein subcellular location is an important issue in computational biology because it provides important clues for characterization of protein function. Currently, much effort has been dedicated to developing automatic prediction tools. However, most of them focus on mono-locational proteins. It should be noted that many proteins bear multi-locational characteristics, and they ca...
متن کاملMulti-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization
Recent years have witnessed much progress in computational modelling for protein subcellular localization. However, the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance, and the gene ontology (GO) based models may take the risk of performance overestimation for novel proteins. Furthermore, many human proteins have multiple subcellular locations, which...
متن کاملPredicting subcellular localization of proteins in a hybridization space
MOTIVATION The localization of a protein in a cell is closely correlated with its biological function. With the number of sequences entering into databanks rapidly increasing, the importance of developing a powerful high-throughput tool to determine protein subcellular location has become self-evident. In view of this, the Nearest Neighbour Algorithm was developed for predicting the protein sub...
متن کاملPlant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization
One of the fundamental goals in proteomics and cell biology is to identify the functions of proteins in various cellular organelles and pathways. Information of subcellular locations of proteins can provide useful insights for revealing their functions and understanding how they interact with each other in cellular network systems. Most of the existing methods in predicting plant protein subcel...
متن کاملA Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites
Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the "multi-label scale" and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localizatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 9 شماره
صفحات -
تاریخ انتشار 2014